NAVAL POSTGRADUATE SCHOOL 
Monterey , California 




THESIS 



THE AGGREGATION OF POPULATION GROUPS 
TO IMPROVE THE PREDICTABILITY OF 
MARINE CORPS OFFICER ATTRITION ESTIMATION 

by 

Randall W. Larsen 

i f * 

December 1987 



Thesis Advisor: 



Robert R. Read 



Approved for public release; distribution is unlimited 



T239056 



UNCLASSIFIED 

fctuY CLASSIFICATION OF THIS PAGE 



REPORT DOCUMENTATION PAGE 



EPORT SECURITY CLASSIFICATION 

UNCLASSIFIED 



lb. RESTRICTIVE MARKINGS 



ECURITY CLASSIFICATION AUTHORITY 



^CLASSIFICATION / DOWNGRADING SCHEDULE 



3. DISTRIBUTION / AVAILABILITY OF REPORT 

Approved for public release; 
distribution is unlimited 



iRFORMING ORGANIZATION REPORT NUMBER(S) 



5. MONITORING ORGANIZATION REPORT NUMBER(S) 



IAME OF PERFORMING ORGANIZATION 

• r al Postgraduate School 



6b. OFFICE SYMBOL 
(If applicable) 

Code 54 



7a. NAME OF MONITORING ORGANIZATION 

Naval Postgraduate School 



.DDRESS (City, State, and ZIP Code) 

uterey, California 93943-5000 



7b. ADDRESS (City, State, and ZIP Code) 

Monterey, California 93943-5000 



IAME OF FUNDING /SPONSORING 
RGANIZATION 



8b. OFFICE SYMBOL 
(If applicable) 



9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 



•DDRESS (City, State, and ZIP Code) 



10. SOURCE OF FUNDING NUMBERS 



PROGRAM 


PROJECT 


TASK 


WORK UNIT 


ELEMENT NO. 


NO. 


NO 


ACCESSION NO. 



ITLE (Include Security Classification) 

(I AGGREGATION OF POPULATION GROUPS TO IMPROVE THE PREDICTABILITY OF MARINE 
IPS OFFTCFP ATTRITION ESTIMATION 



ERSONAL AUTHOR(S) 

rsen, Randall W. 



13b. TIME COVERED 


14. DATE OF REPORT (Year, Month, Day) 


15 PAGE COUNT 


FROM TO 


1987, December 


110 



TYPE OF REPORT 

ter ' s Thesis 



UPPLEMENTARY NOTATION 



COSATI CODES 


18 SUBJECT TERMS ( Continue on reverse if necessary and identify by block number) 


FIELD 


GROUP 


SUB-GROUP 


Loss Rate Cluster Analysis; Officer Attrition 








Rate 











ABSTRACT (Continue on reverse if necessary and identify by block number) 



This thesis presents an algorithm for the aggregation of low inventory 
•.tegories (small cells) which characterize the population of Marine Corps, 
irestricted, active duty officers. The basis for aggregating these small 
:11s is the degree of homogeneity of historical attrition rates. The 
;chniques of hierarchical cluster analysis are applied to the small cell 
•oblem in lieu of existing functional and organizational structures. 



This research demonstrates the adaptability of cluster analysis to loss 
ite aggregation and provides a shell for more refined model applications, 
irther, statistical stability and attrition rate homogeneity have been 
.troduced to allow for subsequent application of shrinkage type parameter 



DISTRIBUTION /AVAILABILITY OF ABSTRACT 
0 UNCLASSIFIED/UNLIMITED □ SAME AS RPT 


□ DTIC USERS 


21. ABSTRACT SECURITY CLASSIFICATION 

Unclassified 


NAME OF RESPONSIBLE INDIVIDUAL 

of. Robert R. Read 


22b. TELEPHONE (Include Area Code) 

(408) 646-2382 


tic. OFFICE SYMBOL 

Code 5 5 Re 



FORM 1473, 84 MAR 83 APR edition may be used until exhausted. SECURITY CLASSIFICATION OF THIS PAGE 

All other editions are obsolete Govern ™n, enn.m, o..i«= i986-6o 6 . 2 4. 

i UNCLASSIFIED 




UNCLASSIFIED 



ii UNCLASSIFIED 



SECURITY CLASSIFICATION OF THIS PAGE 



Approved for public release; distribution is unlimited 



The Aggregation of Population Groups to 
Improve the Predictability of Marine Corps 
Attrition Estimation 

by 

Randall W. Larsen 

Captain, United States Marine Corps 
B.S., Iowa State University, 1976 



Submitted in partial fulfillment of the 
requirements for the degree of 



MASTER OF SCIENCE IN MANAGEMENT 



from the 

NAVAL POSTGRADUATE SCHOOL 
December 1987 



ABSTRACT 



U-' 



4 f 



This thesis presents an algorithm for the aggregation of 
low inventory categories (small cells) which .characterize 
the population of Marine Corps, unrestricted, active duty 
officers. The basis for aggregating these small cells is 
the degree of homogeneity of historical attrition rates. 
The techniques of hierarchical cluster analysis are applied 
to the small cell problem in lieu of existing functional and 
organizational structures. 

This research demonstrates the adaptability of cluster 
analysis to loss rate aggregation and provides a shell for 
more refined model applications. Further, statistical 
stability and attrition rate homogeneity have been 
introduced to allow for subsequent application of shrinkage 
type parameter estimation methods associated with the 
development of an officer attrition rate generator. 



THESIS DISCLAIMER 



The reader is cautioned that computer programs developed 
in this research may not have been exercised for all cases 
of interest. While every effort has been made, within the 
time available, to ensure that the programs are free of 
computational and logic errors, they cannot be considered 
validated. Any application of these programs without 
additional verification is at the risk of the user. 



v 



TABLE OF CONTENTS 



I. INTRODUCTION 1 

A. GENERAL 1 

B. BACKGROUND — 1 

C. RESEARCH QUESTION 4 

D. KEY TERMS 4 

E. SCOPE OF THE THESIS 6 

F. ORGANIZATION 8 

II. LITERATURE REVIEW - — 10 

A. PRIOR STUDIES — 10 

B. OPERATIONAL AND THEORETICAL BACKGROUND 12 

III. DATA BASE 14 

A. GENERAL ; 14 

IV. CURRENT SYSTEM AND PRESENTATION OF NEW CONCEPT — 16 

A. CURRENT SYSTEM 16 

B. AGGREGATION BY CLUSTER ANALYSIS 22 

C. DESCRIPTIVE VARIABLES 25 

D. CLUSTERING ELEMENTS 29 

E. SIMILARITY MATRIX 34 

F. CLUSTERING CRITERION 37 

G. DENDROGRAM 40 

V. APPLICATION OF CLUSTER ANALYSIS 41 

A. GENERAL 41 

B. SMALL CELL DEFINITION 44 

vi 



C. STAGE ONE — YCS EXPANSION 48 

D. STAGE TWO — SMALL MOS GROUPS 52 

E. STAGE THREE — LARGE MOS GROUPS 58 

F. STAGE FOUR — MAJOR MOS GROUPS 60 

VI. CONCLUSION 62 

A. SUMMARY AND CONCLUSIONS 62 

B. RECOMMENDATIONS 64 

APPENDIX A: DATA FORMAT 66 

APPENDIX B: EXAMPLE OF SUMMARY DATA FILE 82 

APPENDIX C: LOSS RATE COMPUTER PROGRAMS 83 

APPENDIX D: EXAMPLE LOSS RATE MATRIX FILE 87 

APPENDIX E: METHOD TESTING 92 

APPENDIX F: CLUSTER STRENGTH TABLE PROGRAM 98 

LIST OF REFERENCES 100 

INITIAL DISTRIBUTION LIST 102 



vii 



I. 



INTRODUCTION 



A. GENERAL 

The purpose of this research is to enhance the 
predictability of Marine Corps officer attrition estimation. 
This paper is in support of a large, on-going effort 
concerning manpower model development and system integration 
under the broad title of Officer Planning and Utilization 
System (OPUS) . Defense Systems Associates, Inc. (DSAI) , 
Rockville, Maryland, is the contracted system developer of 
OPUS. The Navy Personnel Research and Development Center 
(NPRDC) , San Diego, California, aided by Professor R.R. 
Read, Naval Postgraduate School, Monterey, California, is 
developing an officer attrition rate generator integral to 
OPUS. This thesis is conducted in conjunction with the work 
of NPRDC and Professor Read. 

B. BACKGROUND 

The United States Marine Corps officer corps is a 
hierarchical force of approximately 20,000 men and women. 
Marine Corps officer manpower planners are tasked with 
forecasting accessions, losses and promotions in order to 
meet present and anticipated personnel demands. 

In military manpower planning models, personnel flows 
are generally the result of vacancies created within the 
system. For the most, part, vacancies are the result of 



1 



losses. Losses in the rank hierarchy prompt promotions. 
Vacancies also create needs for accessions to replenish 
desired total force levels. As promotions and accessions 
are directly associated with losses, both are dependent on 
accurate loss forecasting. Underestimating losses can 
result in too few accessions, too few promotions, and 
ultimately may affect mission readiness. Overestimating 
losses can lead to too many accessions, underutilization of 
personnel, delays in promotion, and potential cost overruns. 

The manpower planners of the Marine Corps manage and 
organize officers based on rank and military occupational 
specialty (MOS) . As such, losses must be anticipated for 
each rank category and MOS. In order to project 
comprehensively the effects of attrition on the total force 
structure, losses are categorized by type and several 
descriptive variables associated with officer attrition 
behavior. The definition and discussion of the loss types 
and descriptive classifications are provided in Section D of 
this chapter. 

When the various loss categories and all the defined 
descriptive variables are considered simultaneously in a 
multidimensional array, the number of potential individual 
cells exceeds four billion. As the officer population 
barely exceeds 20,000, the vast majority of the cells are 
unoccupied for either structural or sampling reasons. An 
example of an unoccupied cell due to structural reasons 



2 



would be a cell identifying lieutenant colonels, of any 
particular specialty, with six years of commissioned 
service. Such officers do not exist. Structurally zero 
inventories may be considered permanent conditions. 

An unoccupied cell described as a sampling zero occurs 
due to chance and is not necessarily a permanent condition. 
In such a case a particular rank, MOS, and YCS combination 
may not exist during a particular year. This condition may 
change the following year as a result of promotions, 
accumulating YCS, or change of MOS. 

The situation of sparse data over a large number of 
cells makes the task of accurate empirical rate estimation 
difficult. Small populations of characteristically limited 
and sporadic data lead to statistical instability, which in 
turn aggravates the rate forecasting problem [Ref. l:p. 13; 
Ref. 2:p. 10; Ref. 3:p. 2]. This situation has been 
referred to as the "small cell problem" [Ref. 2:p. 10]. 

Presently, a highly comprehensive modeling system is 
being developed to predict future states of the officer 
force structure. This system, OPUS, is a computer-based 
planning tool reliant on predicted loss rates and target 
strength requirements [Ref. 4:pp. 2-1 — 2-59]. The Marine 
Corps Officer Rate Projector (MCORP) is the source of loss 
rate forecasts [Ref. 5:pp. 1-1 — 1-6]. Within MCORP, a 
computer program algorithm provides an automated calculation 
to meet certain computational requirements of the loss rate 



3 



forecasting system and represents the current solution to 
the small cell problem. 

C. RESEARCH QUESTION 

The primary research question is how to aggregate low 
inventory, officer categories (small cells) into sets of 
homogeneous attrition behavior in order to enhance 

forecasting techniques of developing manpower planning 
models. The solution must be a dynamic scheme in which 

small cells are aggregated in response to user designated 
minimum inventory thresholds. More, the methodology must 
reflect a versatile and flexible nature adaptive to changing 
conditions and the needs of manpower planners. This 
research effort will group officers of similar rank, years 
of commissioned service, and occupational specialty, 
stressing similarity of historical loss rates. 

Subsidiary research questions include first, what 
features constitute a small cell and which categories 
represent small cells? Secondly, which small cells exhibit 
similar loss rate behavior? Finally, how can the 

aggregation of small cells be accomplished in order to meet 
the needs of Marine Corps planners and developing manpower 
models? 

D. KEY TERMS 

The terms loss and attrition will be used 

interchangeably. Losses and loss rates describe the flow of 



4 



officers from particular cells characterized by MOS, YCS, 
rank, etc. Flows may be from one cell to another within the 
Marine Corps or from a cell directly to the civilian labor 
market. Flows within the Marine Corps represent a loss only 
to the former cell not to the Service. Movement due to 
promotion, accumulation of service time, or changes of MOS 
are examples. Officers exiting a cell to the civilian labor 
force constitute an inventory loss to the Marine Corps. 
This project will focus on the attrition and the attrition 
rates of those leaving the Service. 

The following terms will be used frequently in this 
analysis within the narrow context of Marine Corps officer 
manpower management: 

- Accession — Accession refers to the commissioning of a 
new officer into the Marine Corps. 

- Attrition — Attrition is the loss of an officer from the 
Service. 

- Failed Select — Failed Select describes an officer not 

selected for promotion from either within or above the 
promotion zone. A lieutenant or captain who twice fails 

to be selected for promotion to a fixed rank must leave 

the Service. A major, lieutenant colonel, or colonel 
who twice fails to be selected for promotion to the next 

rank is limited to active service of 20, 26, or 30 years 

respectively. 

- MCORP (Marine Corps Officer Rate Projector) — MCORP is an 
interactive software system which calculates Marine 
Corps officer loss rates based on historical attrition 
data. 

- LOS (Length of Service) — LOS refers to the cumulative 
number of years served since date of service entry. 

- MOS (Military Occupational Specialty) — MOS is a four- 
digit code identifying specific, skill-related 
classifications of Marines. 



5 



- OPUS (Officer Planning and Utilization System) — OPUS is 
a set of comprehensive computer-based models designed to 
support the data processing and forecasting requirements 
of Marine Corps officer planners. 

- Regular Officer — A regular officer is an officer 

designated for long-term active duty, whose Service 
longevity is limited only by continued promotion and the 
statutory limits of service. 

- Reserve Officer — A reserve officer is an officer 

designated to a fixed length of service. Such an 
officer may or may not be on active duty. 

- YCS (Years of Commissioned Service) — YCS refers to the 
cumulative number of years served since date of 
commissioning. 

In Table 1 is found a general description of the 
existing officer classification system and the extent of the 
classification alternatives. Appendix A offers a detailed 
explanation of all classifications within the data format. 



E. SCOPE OF THE THESIS 

For the stated purpose of this project, research will be 
limited to active duty, unrestricted, officers from the rank 
of second lieutenant to colonel. The management of inactive 
duty officers (inactive duty reservists and retirees) is 
sufficiently different to be excluded from OPUS and 
therefore of little relevance to this study. Limited duty 
officers (LDOs) are addressed separately in OPUS due to 
their unique career paths and characteristics of service; 
thus, this category will not be included in this thesis. 
Finally, the Marine Corps general officers (flag-rank) and 
warrant officers will not be discussed in this research 
project. General officers are an extremely small component 



6 



TABLE 1 



OFFICER CLASSIFICATIONS 



Designation Description Alternatives 

LOSS Retirement, Release, 

Discharge, Resignation, 

and Other . 5 

RANK Warrant Officer to Colonel. 

Differentiated further as 
to restricted or unrestricted 
and Failed-Select or 

Nonfailed Select. 21 

MOS Military Occupational 

Specialty. This category 
may be of further expanded 
by considering Secondary 



and Additional MOSs. 140+ 

YCS Years of Commissioned 

Service. One to 31 plus. 31 

SE Source of Entry 14 

SC Service Component: Regular, 

Reserve to Regular, and 

Reserve Service 3 

SS Service School Completion 7 

SEX Sex 2 

RACE White, Black, Hispanic and 

Other 4 

EDUC Educational Attainment: Non- 

college grad, Four-year 
degree, Masters Degree, and 
Doctorate 4 



of the total force structure, with required management 
taking place at the highest Service level. The warrant 
officers represent a narrowly defined population associated 



7 



with limited MOSs linked to the LDO categories and, as a 
group, have exhibited strong statistical stability in 
attrition behavior. 

F. ORGANIZATION 

In Chapter II a synopsis is presented of previous 
research pertinent to this thesis. In Chapter II is also 
provided a brief review of the theoretical and operational 
literature relevant to the research effort. 

The structure and content of the utilized data bases are 
explained in Chapter III. 

Chapter IV is begun with an explanation of the existing 
methodology for small cell aggregation within MCORP. The 
rationale is then given for the selection of cluster 
analysis in solving the small cell problem. Finally, the 
concepts and characteristics of the chosen technique are 
detailed. 

The discussion in Chapter V describes the specific 
application of the clustering procedure to the research 
problem as well as the validation and analysis of the 
results . 

Chapter VI presents the thesis summary and 
recommendations for ultimate application and maintenance of 
the improved methodology in the Marine Corps manpower 
planning system. 

The appendices contain various details of interest to 
the reader desiring a more thorough explanation or 



8 



programs , 



background on data format, applied computer 
related cluster criterion testing, and other supporting 
material . 



9 



II. LITERATURE REVIEW 



A. PRIOR STUDIES 

This project should be recognized as a logical 
continuation of recent work done by Majors D.D. Tucker, 
USMC , and J.R. Robinson, USMC, and Colonel Amin Elseramegy, 
Egyptian Air Force, in their separate theses, at the Naval 
Postgraduate School. 

In his September 1985 thesis, Tucker [Ref. 1] 
demonstrated the application of statistical shrinkage type 
parameter estimation techniques to the problem of small 
cells. His results were promising, though exploratory. One 
of the major results was the identification of the 
inadequate aggregation methods used by the existing modeling 
system. He felt his work was handicapped by the lack of 
homogeneity of loss rates and the instability of aggregated 
attrition behavior. To thoroughly test his sophisticated 
shrinkage estimation schemes, and ultimately to apply them, 
meaningful and well-behaved empirical attrition rates need 
to be achieved. 

Elseramegy [Ref. 6], completed his thesis work on the 
"CART Program: The Implementation of the Classification and 
Regression Tree Resubstitution Implementation Application" 
in December 1985. A goal of his thesis was to apply the 
CART program to the existing forecasting methods of Marine 



10 



Corps officer attrition rates. Ultimately the program 
proved too difficult for effective use and suffered 
structural limitations when dealing with cells of 
potentially widely varying inventories. 

Robinson's March 1986 [Ref. 2] thesis, "Limited 
Translation Shrinkage Estimation of Loss Rates in Marine 
Corps Manpower Models," was a direct follow-on to Tucker's 
work. He tested and compared various statistical estimation 
techniques for the generation of attrition rates. Again, 
his results revealed the inadequacies of existing officer 
category aggregations. 

Other useful background literature included studies and 
reports of U.S. Navy issues closely related to this thesis. 
The work of Siegel [Ref. 7] at NPRDC , describes the seven 
year attrition rate and forecasting methods used by the 
Navy. His report describes the Officer Retention Forecast 
Model (ORFM) and illustrates its capabilities. 

A second study done by Bres and Row [Ref. 8] discusses 
time series-based forecasting techniques used with great 
success by the Navy in forecasting loss rates within the 
unrestricted line officer community. 

Finally, work by Butterworth and Milch [Ref. 3] presents 
valuable insight to hierarchical aggregation applications as 
applied to Navy enlisted ratings. 



11 



B. OPERATIONAL AND THEORETICAL BACKGROUND 



As this thesis requires a functional knowledge of 
current and future Marine Corps manpower models the 
following literature provides necessary operational 
background . 

In the "Functional Description for the Development of 
the Officer Planning and Utilization System (OPUS) " produced 
by DSAI [Ref. 4], is provided a written description from the 
developer to the Marine Corps on the OPUS project. It 
includes performance requirements of the various models, 
preliminary design strategies, and user inputs. 

The "User's Manual for the Officer Rate Generator," by 
DSAI [Ref. 9], provides the reader with information 
necessary for effective use of the officer loss rate 
generator. 

In "System Design for the Marine Corps Officer Rate 
Projector (MCORP) " by NPRDC [Ref. 5], the MCORP system is 
discussed in general terms based on operational objectives 
and design. 

The "OPUS — System Specification" by DSAI [Ref. 20] 
provides a detailed definition of the functions of the Year- 
Group and Steady-State Promotion models of OPUS. 

In "OPUS — System Specification for Optimum Officer Force 
Model" by DSAI [Ref. 11], an in-depth definition of the 
functions of the Optimum Force Model and the interfacing 



12 



techniques for use with other systems and programs are 
provided . 

The "OPUS — System Specifications for Officer Population 
Simulation" by DSAI [Ref. 12] defines the functions and 
details for interfacing the Officer Population Simulator 
with the planning models of OPUS. 

In the "Users Manual for the Officer Planning and 
Utility System (OPUS)" DSAI [Ref. 13] provides application 
information for the recently developed Steady-State 
Promotion and Year-Group models. 

A group of textual references address the theoretical 
concepts as well as the relevant statistical and modeling 
techniques. These include Bartholomew and Forbes' 
Statistical Techniques for Manpower Planning [Ref. 14]; 
Berenson, Levine, and Goldstein's Intermediate Statistical 
Methods and Applications [Ref. 15] ; and Grinold and 
Marshall's Manpower Planning Models [Ref. 16]. 

In his classical work on the subject, Johnson [Ref. 17] 
describes the classical theory and nature of hierarchical 
clustering as well as illustrative examples of pertinence to 
this thesis. Further description, discussion and 
application of cluster analysis techniques and algorithms 
were provided by Anderberg, Cluster Analysis for 
Applications [Ref. 18]; Lorr, Cluster Analysis for Social 
Scientists [Ref. 19]; and Norusis, SPSSX — Advanced 
Statistical Guide [Ref. 20] . 



13 



III. DATA BASE 



A. GENERAL 

The key data base for this analysis is a summary data 
file designed and compiled by personnel of NPRDC . The 
summary data file was created from two Marine Corps files: 
the Headquarters Master File (HMF) and the Quarterly 
Statistical Transaction File (STATS) . 

The HMF is the primary source of data for historical 
officer inventories. September 30 (end of fiscal year) 
"snapshots," from 1977 to 1986, are used to produce these 
inventories. The STATS provides input for the generation of 
historical losses. The two files are merged and sorted to 
create counts and inventories of all Marine Corps officers 
of the ten year period [Ref. 5:pp. 2-4 — 2-22]. 

The summary data file separates the individual records 
according to the unique characteristics of MOS, LOS, rank 
and loss type combinations. The data format is presented in 
Appendix A. 

The summary data file contains a total summary of the 
actual officer inventory and loss counts of e.ach combination 
of variable characteristics descriptive of existing 
officers, by fiscal year. Appendix B provides an example of 
raw data from the summary data file. The data file is a 



14 



direct access file accessible via the Conversational 
Monitoring System (CMS) . 

Additionally, the MCORP model, using a flexible 
multiple-diskette version of the summary data file, allows 
rapid access to historical inventories and user-weighted 
loss forecasts through microcomputer application. The MCORP 
model is capable of generating output in several convenient 
report formats: Groups by Year, Groups by YCS, and Grade by 
YCS . 

The Defense Manpower Data Center (DMDC) , Monterey, 
California, provided a third source of officer inventory and 
attrition data. These Defense Department data are 
essentially similar to those in the summary data file and as 
a result afford an additional reference resource and an 
excellent basis for input and output comparisons. 



15 



IV. CURRENT SYSTEM AND PRESENTATION OF NEW CONCEPT 



A. CURRENT SYSTEM 

The historic loss rate calculation is essential to the 
successful application of the manpower models as emphasized 
by Barholomew and Forbes [Ref. 14] and Grinold and Marshall 
[Ref. 16]. Loss rates for OPUS are generated by MCORP from 
data found in the summary data file. It is the calculation 
of these loss rates that is hampered by low officer 
inventories within specific cells, i.e., the small cell 
problem. 

The current approach to answering the small cell problem 
is termed the "Small Cell Override Methodology" [Ref. 5:pp. 
3-10 — 3-11, H-l] . The goal of the override methodology is to 
expand the inventories of categories with small populations 
to avoid over- and under-estimating attrition patterns due 
to low denominator ratios. As an example, the loss rate 
resulting from the retirement of one officer during a 
period, from a population of three (small cell) probably 
yields a poor base from which to estimate attrition behavior 
for that group. 

Though the data base contains the inventories of ten 
years, the dynamic nature of officer manpower flows requires 
that rates reflect current trends as well as long-termed 
historical attrition. The present procedure is a prototype. 



16 



It is acknowledged by user and developer as an interim, ad 
hoc process based upon perceived officer attrition 
similarities along traditional classification structures. 
The specific need for refinement in the small cell 
aggregation methodology has been demonstrated in the 
preceding attrition estimation improvement research of 
Tucker [Ref. 1] and Robinson [Ref. 2]. 

At present, annual, and even quarterly, loss rate 
calculations are insufficient to meet the acceptable 
forecasting tolerances required of the officer manpower 
planners. Forecasting errors of between 50 and 100 cases 
occur. The impact such errors have when reconciled with 
legislated strength authorizations is significant and 
costly. 

With recent emphasis on large scale officer reductions, 
monthly forecasts are becoming common management 
requirements. The estimating difficulties encountered with 
small annual categorical loss inventories are multiplied 
when faced with monthly estimation demands. 

Presently, MCORP offers the user alternative selections 
of small cell population minimums. Cell inventory 
specifications are available from one to 50, with a default 
inventory threshold of 30 cases. This requires that the 
cell population exceed the specified minimum number of 
cases. If the cell inventory fails to meet the threshold 
requirement, the small cell override methodology activates. 



17 



A hierarchical series of cellular expansions takes place 
until the required population is reached. The following 
paragraphs provide a verbal explanation of the small cell 
expansion. 

1 . Test One 

Under this test, the single cell is expanded 
laterally, across YCS, in a stepwise fashion, potentially to 
include all YCSs 1 . MOS, RANK, and all other variables are 
unchanged. If this test fails to reach the threshold 
inventory, then proceed to Test Two. 

2 . Test Two 

With Test Two the single cell is expanded to include 
all MOSs in the operational MOS Group of the designated MOS. 
See Table 2 for a description of the traditional MOS groups. 
YCS, RANK, and all other variables are unchanged. If this 
test fails to reach the threshold inventory, then proceed to 
Test Three. 

3 . Test Three 

Using Test Three the single cell is expanded to 
include all MOSs in its group and YCSs are expanded 
laterally, in a stepwise fashion, potentially to include all 
YCSs. RANK and all other variables are unchanged. If this 



•^-Year 20 is a barrier to YCS expansion from either 
direction, due to the retirement eligibility. The 20-year 

YCS is recognized as an obvious boundary of change in loss 
behavior. 



18 



TABLE 2 



TRADITIONAL MOS GROUPS 



Group Name MOS 

COMBAT 0302 



COM/SUPP 


0802 


1302 


1802 


1803 








COM/SERV 


0180 


0202 


0402 


2502 


2602 


3002 


3060 


3402 


3415 


3502 


4002 


4302 


5803 




HELO 


7562 


7563 


7564 


7565 


7566 






TACAIR 


7501 


7508 


7509 


7511 


7522 


752 3 


7543 




7545 


7556 


7557 


7576 








NFO 


7583 


7585 


7586 


7588 








AIR/GRD 


6002 


7204 


7208 


7210 


7820 






LAWYER 


4402 














ALLOTHER 


0101 


0160 


0170 


0201 


0205 


0210 


0301 




0401 


0430 


0801 


0803 


1120 


1301 


1310 




1360 


1390 


1402 


1502 


1801 


2101 


2110 




2120 


2125 


2305 


2501 


2601 


2802 


2805 




2810 


2830 


3001 


3010 


3050 


3070 


3102 




3302 


3402 


3406 


3410 


3501 


3510 


4001 




4006 


4010 


4130 


4301 


4401 


4430 


4602 




5502 


5505 


5702 


5910 


5950 


5970 


6001 




6004 


6007 


6302 


6502 


6802 


7002 


7201 




7301 


7330 


7380 


7500 


7510 


7520 


7521 




7540 


7542 


7550 


7560 


7575 


7580 


7581 




7584 


7587 


7597 


7598 


7599 


9901 


9904 




9906 


9907 


9908 


9914 


9925 







test fails to reach the threshold inventory, then proceed to 
Test Four. 

4 . Test Four 

Under Test Four the single cell is expanded to 
include all MOSs. YCS, RANK, and all other variables are 



19 



unchanged. If this test fails to reach the threshold 
inventory, then proceed to Test Five. 

5 . Test Five 

With Test Five the single cell is expanded to 
include all MOSs, and YCSs are expanded laterally in a 
stepwise fashion, potentially to include all YCSs. RANK and 
all other variables are unchanged. 

The current small cell aggregation methodology 
implies several troublesome assumptions. Test 1 expands 
cells across YCS. The procedure acknowledges the 20-year 
mark as the single truncation point for significant changes 
in YCS-based loss behavior. However, in recent years, 
losses of Marine Corps captains, for example, has taken 
place over the span of 12 separate YCSs without crossing the 
20-year barrier. To assume homogeneous behavior of 
similarly categorized officers across a broad range of 
career experience and maturity does not j ibe with true 
attrition rate relationships. 

Exploratory clustering of loss rates by YCS for each 
rank has produced consistent empirical evidence supporting 
the contention that wide ranges in attrition behavior do 
occur within classifications based on rank and MOS. 
Bartholomew and Forbes [Ref. 14:pp. 12-16] discuss the 
matter of the influence of length of service on attrition 
rates in more detail. 



20 



With the present override, small cells are expanded 
across MOSs within functionally defined MOS groups in Test 
2. This test assumes similar loss behavior among officers 
in the groups described in Table 1. Do pilots of different 
fixed-wing aircraft types, group TACAIR, exhibit homogeneous 
attrition rates? One might expect the job opportunities 
with civilian airlines to vary between pilots of KC-130 
propeller-driven refuelers and pilots of F/A-18 
fighter/attack airplanes. Similarly, in the COM/SERV group, 
highly trained, data systems officers (MOS 4002) , with 
talents readily transferable to the civilian labor market, 
are aggregated with officers possessing more military 
specific skills of the intelligence community (MOS 0202) . 
finally, the gross aggregation of the ALLOTHER category 
combines such diverse groups as basic infantry officers (MOS 
0301) , disbursing officers (MOS 3402) , and student judge 
advocates (Juris Doctorate in hand, MOS 4401) . Though the 
MOSs in this group tend to be generally rank or YCS 
specific, loss rates may show excessive heterogeneity in 
cases where non-MOS categories coincide over diverse 
occupational specialties. 

Test 3 expands cells to include commonly classified 
officers across all MOSs and YCSs. This aggregation can be 
characterized as potentially sharing the same, assumption 
weaknesses as the previous stages of the override 
methodo logy. 



21 



Small cell expansion in Test 4 and Test 5 includes 
the loss inventories of all MOSs and, in Test 5, all YCSs as 
well. Though these levels in the hierarchy are infrequently 
exercised, there appears little theoretical basis to assume 
that the results of such ranging aggregation might generate 
particularly homogeneous groups of loss rate behavior. 

B. AGGREGATION BY CLUSTER ANALYSIS 

This aggregation methodology is proposed in support of 
an empirical Bayes officer attrition rate estimation scheme 
under development by Professor R.R. Read. Such schemes 
utilize the currently popular shrinkage type parameter 
estimation methods recently researched by Tucker [Ref. 1] 
and Robinson [Ref. 2]. 

Statistical methods of this category "shrink" groups of 
empirical cell rates toward a grand mean. Aggregate rate 
shrinkage enhances the statistical stability of loss rates, 
particularly those of small cells. Shrinkage estimation 
procedures perform best if the designated groups 
(aggregates) are as homogeneous as possible. It is this 
final characteristic of internal aggregate homogeneity which 
led to the application of cluster analysis. 

The process of cluster analysis provides an effective 
tool with which to explore the existing data set for clues 
about data categorization. In this research the objects of 
analysis are the specifically classified officer 
descriptions, i.e., individual cells, in the historical 



22 



summary data file. The purpose of clustering is to discover 
a classification scheme for individual cells which reflects 
increased homogeneity in attrition rates when compared to 
traditional groupings. 

As described in the previous section, the present 
aggregation methodology relies on officers in organization- 
ally and functionally defined groups to demonstrate similar 
attrition behavior. Cluster analysis can lead to the 
discovery of alternative schemes to the traditional methods 
of officer categorization. Alternative population 
partitions which show improved homogeneity of internal 
historical loss rates can serve as the basis for improved 
small cell aggregation methods. 

Cluster analysis includes many heuristic procedures and 
statistical applications which can sort data into 
homogeneous subgroups based on certain measures of 
similarity. Of application to this study is the 
hierarchical clustering technique. A brief description of 
this procedure follows. Greater detail is provided by 
Johnson [Ref. 17], Anderberg [Ref. 18], and Lorr [Ref. 19]. 

Hierarchical clustering aggregates objects into sets of 
clusters according to selected criteria of measured 
similarity between data elements. A common technique of 
visual representation of a hierarchical clustering scheme is 
the dendrogram, see Figure 1. 



23 



CASES 



1 



2 



3 



4 



ROOT 



5 



0 5 10 15 20 25 

DISTANCE SCALE 



Figure 1. Dendrogram 



24 



Cases 1-5 in Figure 1 represent individual objects. The 
root depicts the aggregation of all objects into one set. 
By moving from left to right the various entities are 
sequentially merged into larger and fewer clusters according 
to the extent of similarity. This is termed the 
agglomerative method. The distance scale represents the 
degree of selectivity associated with the formation of the 
clusters. The smaller the distance, the closer, more 
similar, are the grouped objects. 

In the following sections the major steps in the cluster 
analysis methodology used in this thesis will be described. 

C. DESCRIPTIVE VARIABLES 

The variables selected to describe officer attrition are 
the attrition rates of the loss types as discussed in 
Chapter I: Retirement, Release, Discharge, Resignation, and 
Other. This is an inclusive list of both voluntary and 
involuntary attrition. Summation of the loss inventories 
equates to the total strength losses. 

The data base provides loss counts for each cell in man- 
quarters, over the ten years of data. Appendix C provides 
the FORTRAN computer programs used in the creation of 
various group loss rates. The basic equation used in the 
computation of annual loss rates is as follows: 



25 



1 



annual loss rate 



= 4 x man-qtr loss counts/year average strength 

“ 4L ijkm/( s i-l,k + s i ,1c) 1 / 2 
= L ijkm/* 125 ( s i-l,k + s i,k) 

where : 

-*-ijk = loss rate of time k, group j, year i 

Lijkm = loss inventories in quarter m, type k, 
group j , year i 

s ik = year end inventory of group k in year i. 

Within the data file, the annual loss cell inventories 
have been divided by four for administrative reasons in 
order to provide planners with quarterly counts. In order 
to annualize the inventories, the quarterly counts must be 
multiplied by four, then divided by the total year strength 
figure. In this case, year strength is an average. As 
losses take place throughout the year, an average of the 
beginning and end inventories is the best figure available 
for total strength. 

Two weighted rates were generated for clustering 
applications. The two rates were computed primarily to 

facilitate aggregation analysis. Neither rate presumes to 
reflect the most correct weighting schedule. Such claims 
are beyond the scope of this research. As a matter of 
interest the weighted rates were typically very similar, 
within + or .005. On the few occasions when the 



26 



difference was as much as + or - .01, it was generally an 
indicator of MOS restructuring due to policy changes or 
technological advances . 

The first rate was an annualized, most-recent-f ive-year 
rate recommended for consideration by Professor R.R. Read. 
The equation follows: 

five-year average loss rate 

4 x (sum of man-qtr loss counts, 1982-1986) 

/sum of yr average strengths, 1982-1986 

4(1 Lijtan/I(Si-l,k + S i(k )V2 

I L ijJar/*225[ ] 

In this rate equation the quarterly loss inventories of 
the five-most-recent-years are summed and annualized 
(multiplied by four) . The result is then divided by the 
summation of the average total annualized inventories of the 
same five years. 

Such a ratio results in an equal weighting of data from 
the last five years. The implied presumption of this rate 
is that an average of recent attrition data provides a 
better picture of representative strength loss ratios than 
does any one previous year. Further, that data from years 
1978-1981 offer no representative relevance. 



1 5yr “ 



27 



A second contrived rate was the weighted loss ratio over 
the entire data set as recommended by MCORP designer, B. 
Siegel. The weighting scheme is shown below: 



Year 


Weicrht 


Ratio 


1978 


1 


.034 


1979 


1 


. 034 


1980 


1 


. 034 


1981 


1 


. 034 


1982 


1 


. 034 


1983 


3 


.103 


1984 


5 


.172 


1985 


7 


.241 


1986 


9 


.310 



Such a schedule strongly weights the loss data of the 
most recent years, with earlier years receiving less 
emphasis. Using this approach, the generally desired 
preference of utilizing all available data is to an extent 
realized while giving proportionally greater emphasis on 
recent activity. The basic equation follows: 



IlOyr = ten-year weighted loss rate 



= 4 x (sum of weighted man-qtr loss counts 

1978-1986) /sum of wtd yr average total 
strengths 1978-1986 

= 4gwt i (L in j k )J/Z[Wt i (S i _ lrk + S i(K )V2] 

= EwtiLinjfc/.lZSdtWtifSi.!^ + S iik )V2], 



28 



where : 



wt = weight ratio. 

In the ten-year weighted loss rate, the annualized loss 
inventories are multiplied by a weighting factor prior to 
summation. (The sum of the weighting factors must equal 
one.) The weighted sum of loss inventories is then divided 
by similarly weighted average total strength inventories, 
summed . 

Appendix C again provides a display of the FORTRAN 
computer programs utilized. To facilitate the application 
of these rates, the ratios are saved and assembled into a 
file, in matrix form, by loss type, according to year and 
specified group. In this situation the specified group 
provides the clustering elements subject to ultimate 
aggregation. An illustration is provided below in Table 3 
which includes annual rates, the most-recent-five-year 
average rate (year 98) , and the ten-year weighted loss rate 
(year 99) . Appendix D furnishes a complete example of the 
loss rate matrix. 

D. CLUSTERING ELEMENTS 

An understanding of the research purpose and initial 
familiarization with the data set serves as the basis for 
the development of a clustering strategy. Definitive 
recipes cannot exist for the selection of clustering 

29 



TABLE 3 



EXAMPLE LOSS RATES BY YEAR AND MOS GROUP 



Year 


Group 


Retire 


Release 


Discharge 


Resign 


Other 


78 


1 


.027 


.037 


.005 


.019 


.002 


78 

O 


2 

9 


.029 

o 


.049 

9 


.006 


.021 

9 


.002 

9 


o 

9 

78 

© 


9 

12 

9 


e 

.064 

e 


© 

© 

.012 

9 


9 

.003 

9 


9 

9 

. 005 

9 


9 

9 

.001 

9 


© 

• 

86 

© 


9 

9 

1 

9 


© 

© 

.021 

9 


9 

9 

.053 

© 


9 

9 

.003 

• 


9 

9 

.016 

9 


9 

9 

.001 

9 


• 

« 

86 


9 

9 

12 


e 

9 

.003 


€> 

• 

.005 


© 

• 

.004 


9 

9 

.006 


9 

9 

. 000 


98 

© 


1 

9 


.020 

• 


.040 

• 


.007 

© 


.016 

• 


. 002 
• 


• 

e 

98 


9 

9 

12 


• 

© 

.035 


• 

• 

.014 


• 

• 

.005 


© 

© 

.008 


• 

• 

.001 


99 

c 


1 

9 


.021 

• 


.045 

• 


.007 

• 


. 017 
• 


.002 

9 


© 

• 

99 


9 

9 

12 


• 

• 

.036 


• 

• 

.014 


• 

© 

.005 


• 

• 

. 007 


9 

9 

. 001 


elements which will 


lead to 


interesting 


and 


relevant 



classifications. Further, as emphasized by Anderberg [Ref. 
18:pp. 182-185], a clustering strategy is generally a 



30 



sequential process, responding to increased knowledge about 
the data and adapting the new information at every stage. 

In this study, the clustering elements selected include: 
YCS , MSO (including various MOS groups) , and RANK. 
Justification for the selection of clustering units follows 
in the paragraphs below. 

Length of service is acknowledged by Bartholomew and 
Forbes [Ref. 14 :p. 14] and others, as a primary, if not 
dominant, factor affecting the propensity of an individual 
to leave an organization. In general, the propensity to 
leave decreases with increased length of service, salary, 
and status. 

In this research, YCS is used as a surrogate for length 
of service. This substitution appears appropriate as, in 
the large majority of cases, YCS equals length of service. 
In the relatively infrequent situations where unrestricted 
officers have significant amounts of enlisted service, YCS 
is less than actual length of service. In these cases, 
however, YCS is still a major determinant for promotion, 
authority, and responsibility. 

In the loss rate matrix formation, YCS (one year to 31 
years) becomes a specified row identity. Appendix C 
provides the computer program utilized and Appendix D offers 
the loss rate matrix file. YCS is initially clustered over 
the entire data set for a broad perspective of data loss 



31 



rate behavior. Subsequently, YCS is clustered with respect 
to more homogeneous MOS groups for comparison and analysis. 

The 140+ officer MOSs in the Marine Corps represent a 
diverse collection of fields and duty descriptions. MOSs 
vary in the amount and expense of initial and follow-on 
training required to fulfill occupational requirements. As 
a result, varying degrees of transferability of skills to 
the civilian labor market can be identified with MOS 
categorization. The training required of a lawyer (MOS 
4402) or basic jet-fighter pilot (MOS 7520) is far more 
expensive in time and money than initial training for an 
officer in the intelligence specialty (MOS 0202) . Further, 
the value of equally transferable skills can also vary. 
Both a multi-engine KC-130 aircraft pilot (MOS 7557) and a 
military police officer (5803) might share easily 
transferable skills but the corresponding civilian salaries 
for similarly successful former officers may be quite 
different. 

Some specialties exhibit more typically arduous duties, 
such as infantry (MSO 0302) or combat engineer (MOS 1302) . 
Such differences may be reflected in the collective 
attrition behavior. Still other specialties may be 
identified as quite unique in a variety of obvious and less 
than obvious characteristics of duty, population, or 
environment which cause them to respond with significantly 
different group loss rates. 



32 



Due to the mentioned theoretically-based variances and 
differences, less well-understood or accepted, even the 
casual observer would expect divergent attrition behavior 
across the various MOSs and MOS groups. MOS appears to be a 
logical and appropriate clustering variable which 
intuitively should yield interesting categorizations. 

Three variations of MOS groupings already exist in 
functional hierarchy. The lowest level is the four-digit 
MOS. The next degree is the occupational field group. 
These groups consist of all MOSs sharing similar first and 
second digits. Occupational field (OCCFLD) 34, Auditing, 
Finance, and Accounting, consists of MOS 3401, MOS 3402, and 
MOS 3415. Finally, the MOS groups described in Table 2 are 
the largest of the three groupings. These three categories 
provide the initial clustering elements for analysis. 

Similarly to YCS, MOS or MOS groups become the row 
identities in the loss rate matrix formation. Appendix C 
provides the computer program utilized in this project and 
Appendix D offers the subsequent loss matrices. 

RANK was the third designated clustering element. This 
characteristic is strongly associated with YCS but does 
offer a measure of officer performance. The utility of RANK 
as a performance measure is enhanced by the inclusion of 
failed-select status as a categorization. For the complete 
categorization of RANK, see Appendix A. 



33 



Since the scope of this thesis does not cover warrant 
officers or LDOs, these ranks are eliminated from the RANK 
clustering. Appendix C includes the programming of the RANK 
variable. 

For the manpower manager, the selected clustering 
elements represent the most interesting descriptive aspects 
of the officer populations with regard to attrition 
behavior. RANK, YCS, and MOS are the major elements of 
management concern and are the natural cases to be used in 
the definition of new attrition rate aggregates. 

E. SIMILARITY MATRIX 

The hierarchical clustering method requires that every 
pair-wise combination of clustering variables be defined by 
a measure of similarity. Similarity is measured by the 
proximity or distance between entities. The process of 
similarity computation leads to the creation of a lower 
triangle similarity matrix. Figure 2 shows the similarity 
matrix. 

There exist numerous distance measures available for use 
in the creation of the similarity matrix. Lorr [Ref. 17:pp. 
32-34] and Anderberg [Ref. 18:pp. 98-110] discuss various 
distance functions referred to as metrics . The Chebychev 
distance metric was selected as the measure for use in this 
research and is represented as follows: 

D(x,y) = MAX | X ± j - Y ik | 



34 



where : 



X^j = loss rate of the jth cell of the ith 
variable 

Y-j^ = loss rate of the kth cell of the ith 
variable. 



s 2 1 

S 31 S 32 

S 41 S 42 S 43 



s nl s n2 s n3 



s n(n-l) 



Source: M.R. Anderberg, Cluster Analysis for 

Applications (New York: Academic Press, 
1973) : 133, Figure 6.2. 



Figure 2. Lower Triangle Similarity Matrix 



The Chebychev metric measures the distance between 
entities as the maximum absolute difference in value for any 
one variable. When officer attrition behavior is 
characterized by the previously mentioned rates of loss, 
typically it is one, or perhaps two, of the rates that are 
of interest at a particular career moment. It is these 



35 



singular rates that practically define the unique nature of 
the individual cell. The outlying, or distinguishing, loss 
rate is the primary ratio of interest that is best isolated 
by using the Chebychev metric. For instance, the most- 
recent-five-year loss type attrition rates for an infantry 
captain (MOS 0302) are: 

Loss type: Retire Release Discharge Resign Other 

Loss rate: .002 .012 .002 .030 .001 

Compare these to the rates of an aviator captain, who 
flies F-4 fighter aircraft: 

Loss type: Retire Release Discharge Resign Other 

Loss rate: .000 .053 .002 .032 .003 

In this situation the Release type loss rate is the aspect 
of attrition that distinguishes the otherwise similar loss 
behavior difference between these two categories of 
officers. The Chebychev metric bases the calculation of 
similarity on the maximum difference of loss rate types. 
Thus the nature of the data suggests the Chebychev distance 
metric. 

Alternatives to the Chebychev metric often are based on 
the sums of differences between variables and would obscure 
the most dramatic aspects of cell differences. Further, the 
fact that the loss variables are of a binomial distribution 
leading to unequal variances causes all measures based on 



36 



Euclidean distance to be inappropriate ,• e.g., squared 

Euclidean distances, Manhattan distances, etc. 

A sample calculation using the Chebychev distance metric 
is given below: 



78 


1 


.027 


.037 


.005 


.019 


.002 


79 


2 


.029 


.049 


.006 


. 021 


.002 


>(1,2) 


= 


MAX-jJX 


ij “ Y ikl 










— 


MAX | . 


027— . 029 | , 


| .037- 


.049| , 


| .005-. 






1 • 


019-. 021 | , 


| .002- 


.002 j 






= 


MAX . 


002, .012, 


.001, 


.002, . 


001 



= .012 



The hierarchical clustering technique is executed over 
the similarity matrix constructed of resultant distance 
measures. The SPSSx program allows for the specification of 
the Chebychev distance metric by subcommand in the procedure 
CLUSTER as described by Norusis [Ref. 20:pp. 184-185]. 

F. CLUSTERING CRITERION 

Once the similarity matrix is defined, the choice of 
clustering criterion must be addressed. Clustering 
criterion describes how the most similar clusters are to be 
selected. This is the computational burden of the 
hierarchical clustering technique. 



37 



Both Lorr [Ref. 17] and Anderberg [Ref. 18:pp. 134-145] 
offer a variety of clustering criterion options. Every 
clustering method is nominally unique and apart from every 
other method. However, many of the methods tend to yield 
substantially similar results. 

The method selected for this research is known as the 
average between group method . This technique evaluates the 
potential merger of all clusters in terms of the average 
similarity of the links between the cluster pairs. 

Initially, several alternative schemes were rejected as 
inappropriate due to association with various squared 
Euclidean distance metrics. Further, the simplest linkage 
methods tend to base clustering decisions on the minimum or 
maximum distance cluster membership, e.g., the single 
linkage and the complete linkage methods. To avoid such 
dependency on extreme values for the definition of clusters, 
a method using the average of all links of cluster pairs was 
considered most useful and correct. Two such methods are 
the average linkage between groups and the average linkage 
within groups : 

SUM. + SUM. + S. . 

Average linkage ± J ±J — __ 

between groups (lsh + Nj ) (ISL + Nj 1) /2 

Sum i = sum of an pairwise similarities among 
entities within cluster i 

= the number of entities in cluster i 



38 



, . , s. . 

Average linkage i] 

within groups N i N j 

No theoretical considerations or technical explanations 
offer sufficient reason to select one method over the other. 
A test was therefore constructed to compare the clustering 
solutions using the two candidate methods. 

Twelve sets of seven or nine pairs of numbers from 0 to 
.50 were generated to simulate loss rates. The sets were 
clustered using the SPSSX CLUSTER procedure and the results 
were plotted for comparison. 

As anticipated, the majority of the comparisons showed 
little if any difference in aggregation hierarchy. However, 
a few of the sets did show distinct differences and 
demonstrated important clustering trends. The average 
linkage within groups tended to cluster one or two distinct 
groups initially and quickly expand the existing clusters 
into higher levels of aggregation. The average linkage 
between groups tended to create more clusters initially and 
pool clusters into higher levels of aggregation later in the 
sequence. 

More clusters at the lowest level of an agglomerative 
hierarchy provide greater insight into data set 
relationships characterized by inherently small ratio 
differences. The tendency to establish more clusters 
initially was consistent with the needs of this project. 
Therefore, the clustering exhibited in the average linkage 



39 



between groups was preferred and the between groups method 
was selected as the criterion for clustering. The test is 
documented in Appendix E. 

The SPSSx program allows for the average linkage between 
groups method to be specified by subcommand in the procedure 
CLUSTER as offered by Norusis [Ref. 20:pp. 184-185]. 

G. DENDROGRAM 

A final aspect of the hierarchical clustering analysis 
concerns the clustering result. As indicated earlier the 
dendrogram offers a convenient display of the clustering 
sequence and composition. It is desirable in this work to 
also measure the relative population sizes of the clusters 
represented in the aggregation. 

A separate program was created to calculate the 
cumulative population of the associated officers with each 
stage of the aggregation. Using SPSSx the calculation of 
cluster membership at specified stages of aggregation can be 
accomplished. Appendix F provides the program utilized. 



40 



V. APPLICATION OF CLUSTER ANALYSIS 



A. GENERAL 

Using the hierarchical cluster analysis methodologies 
and techniques described in the preceding chapter, loss rate 
matrices and dendrograms were computed and drawn for a 
variety of clustering strategies. Introductory loss rate 
clustering was conducted on the MOS groups from Table 2 , as 
well as the 47 OCCFLDs. The attrition rates of the MOS 
groups appeared to cluster as expected with aviation-type 
groups together and ground-type groups together, etc. 
However, when the loss rates of the OCCFLDs were clustered, 
over the entire population, unexpected relationships 
developed and many perceived similarities were found to be 
without statistical support. 

Length-of-service as discussed previously may be viewed 
as the driving force behind attrition behavior. YCS, a 
length-of-service surrogate, was clustered over the entire 
population. The results of YCS aggregation demonstrated 
significant and consistent attrition is associated with 
various lengths of service. See Figure 3 for illustration 
of this point. 

On inspection, the general, all-service YCS aggregation 
in Figure 3 was found credible. Year four, for instance, 
stands out as a distinct YCS quite in terms of attrition 



41 



DENDROGRAM USING AVERAGE LINKAGE (BETWEEN GROUPS) 

RESCALED DISTANCE CLUSTER COMBINE 
C A S E 0 5 10 15 20 25 



LABEL 


SEQ 


+ + 








15 


15 


- + 








16 


16 


- + 








17 


17 


- + - + 








14 


14 


"+ I 








2 


2 


- + - + 








3 


3 


~+ I 








13 


13 


--- + — + 








IS 


IS 


“““ + I 








1 


1 


- + -+ I 








10 


10 


-+ +- + 








19 


19 


+ I 








11 


11 


— -+ + 


+ 






3 


3 


— -+ + X 


I 






9 


9 


-— + +-+ 


+ - + 






6 


6 


«• a* ac c* mm v a* 


I + 






12 


12 




+ I 






5 


5 


+ 








7 


7 


— - + 








4 


4 











Figure 3 . Entire Population YCS Aggregated Dendrogram 



42 



behavior. The loss rates of Marine officers in their fourth 
YCS do not cluster with other YCSs until the final stages. 
This follows on the basis that the fourth YCS is the time 
when the majority of initial service obligations are met and 
a relatively large number of officers elect to leave the 
Marine Corps. The appearance of the YCS aggregation 
dendrogram follows well the factors of service obligation, 
selection for promotion, and retirement opportunities. 

Exploratory clustering of RANK in various combinations 
across a variety of MOSs substantiated that attrition 
behavior is strongly associated with specific rank. Each 
level exhibited its own unique characteristics. Loss rates 
of captains were generally similar, attrition rates of 
f ailed-select majors were basically the same, etc. 

Promotion to higher rank is largely a function of YCS 
and in practical terms demotion does not exist in the 
Service. Further, accelerated promotion seldom occurs and 
the advancement of officers through the rank of lieutenant 
colonel is fairly predictable. For these reasons RANK was 
not selected as an element for further cluster analysis. 
The situation of failed-selectees can be adequately 
addressed by the designation of the failed-select categories 
in cell definition, see Appendix A. 

With examination and comparison, the above clustering 
schemes led to the discovery of various relationships and 
the development of still more clustering approaches. 



43 



Inevitably specific loss rate case outliers were encountered 
which did not neatly fit into specified groups. In the 
interest of time and expense, those that nearly qualified 
were most often subjectively included into existing groups. 
Outliers with great dissimilarities were individually 
identified, investigated and as necessary, isolated. 

The proposed replacement for the current small cell 
override methodology is presented in Table 4. A discussion 
of the development of this solution is embodied in the 
remainder of this chapter. 

B. SMALL CELL DEFINITION 

Prior to addressing the aggregation specifics, the small 
cell population threshold warrants attention. The small 
cell population threshold is the factor that determines the 
extent of aggregation which will occur when a small cell is 
encountered in the course of a problem involving MCORP. 

The small cell population should remain a flexible 
aspect of the MCORP model. The ability of the user to 
specify a minimum small cell population is a desirable 
feature of this process. Such control can be used to 
influence the conservativeness of small cell loss rate 
generation. 

Selection of a low inventory threshold results in rates 
reflective of relatively few observations in a narrow range 
of parameters. The potential for accurate loss estimation 
from these values exists but the risk of gross error in the 



44 



